传统上,启发式搜索一直依赖于手工制作或编程派生的启发式方法。神经网络(NNS)是更新的强大工具,可用于从州学习复杂的映射到成本到启发式方法。但是,他们缓慢的单个推理时间是一个很大的开销,可以在优化的启发式搜索实现中大大减少计划时间。最近的一些作品描述了利用NN的批处理计算的方法,以减少计划中的开销,同时保持(子)最优性的界限。但是,所有这些方法在建立批处理的同时以“阻止”方式使用了NN启发式方法,并且忽略了通常可以使用的快速计算可接受的启发式方法(例如现有的经典启发式启发术)。我们介绍了一种非阻滞批次A*(NBBA*),这是一种有界的次优方法,它懒洋洋地分批计算NN启发式方法,同时允许通过非NN启发式启发术告知扩展。我们展示了与当前的阻止替代方案相比,这种微妙但重要的变化如何导致扩展大幅减少,并看到该性能与计算出的NN和快速非NN启发式的批处理差异有关。
translated by 谷歌翻译
基于冲突的搜索(CBS)是一种流行的多试路径查找(MAPF)求解器,该求解器采用低级单位代理计划者和高级约束树来解决冲突。绝大多数现代MAPF求解器都专注于通过各种策略减少这棵树的大小来改善CB,几乎没有修改低级计划者的方法。现有CBS方法中的所有低级计划者都使用未加权的启发式启发式方法,次优的CBS方法还使用冲突启发式启发式启发式来帮助高级搜索。与普遍的信念相反,我们表明,通过以特定方式加权冲突,可以更有效地使用启发式成本的启发式。我们介绍了这样做的两个变体,并证明这种变化在某些情况下可以导致2-100倍的加速。此外,据我们所知,我们展示了优先规划和有限的次优的CB的第一个理论关系,并证明我们的方法是它们的自然概括。
translated by 谷歌翻译
How do we design measures of social bias that we trust? While prior work has introduced several measures, no measure has gained widespread trust: instead, mounting evidence argues we should distrust these measures. In this work, we design bias measures that warrant trust based on the cross-disciplinary theory of measurement modeling. To combat the frequently fuzzy treatment of social bias in NLP, we explicitly define social bias, grounded in principles drawn from social science research. We operationalize our definition by proposing a general bias measurement framework DivDist, which we use to instantiate 5 concrete bias measures. To validate our measures, we propose a rigorous testing protocol with 8 testing criteria (e.g. predictive validity: do measures predict biases in US employment?). Through our testing, we demonstrate considerable evidence to trust our measures, showing they overcome conceptual, technical, and empirical deficiencies present in prior measures.
translated by 谷歌翻译
Evaluation is the central means for assessing, understanding, and communicating about NLP models. In this position paper, we argue evaluation should be more than that: it is a force for driving change, carrying a sociological and political character beyond its technical dimensions. As a force, evaluation's power arises from its adoption: under our view, evaluation succeeds when it achieves the desired change in the field. Further, by framing evaluation as a force, we consider how it competes with other forces. Under our analysis, we conjecture that the current trajectory of NLP suggests evaluation's power is waning, in spite of its potential for realizing more pluralistic ambitions in the field. We conclude by discussing the legitimacy of this power, who acquires this power and how it distributes. Ultimately, we hope the research community will more aggressively harness evaluation for change.
translated by 谷歌翻译
Many real-world applications of language models (LMs), such as code autocomplete and writing assistance, involve human-LM interaction, but the main LM benchmarks are non-interactive, where a system produces output without human intervention. To evaluate human-LM interaction, we develop a framework, Human-AI Language-based Interaction Evaluation (H-LINE), that expands non-interactive evaluation along three dimensions, capturing (i) the interactive process, not only the final output; (ii) the first-person subjective experience, not just a third-party assessment; and (iii) notions of preference beyond quality. We then design five tasks ranging from goal-oriented to open-ended to capture different forms of interaction. On four state-of-the-art LMs (three variants of OpenAI's GPT-3 and AI21's J1-Jumbo), we find that non-interactive performance does not always result in better human-LM interaction and that first-person and third-party metrics can diverge, suggesting the importance of examining the nuances of human-LM interaction.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
Tuberculosis (TB), an infectious bacterial disease, is a significant cause of death, especially in low-income countries, with an estimated ten million new cases reported globally in $2020$. While TB is treatable, non-adherence to the medication regimen is a significant cause of morbidity and mortality. Thus, proactively identifying patients at risk of dropping off their medication regimen enables corrective measures to mitigate adverse outcomes. Using a proxy measure of extreme non-adherence and a dataset of nearly $700,000$ patients from four states in India, we formulate and solve the machine learning (ML) problem of early prediction of non-adherence based on a custom rank-based metric. We train ML models and evaluate against baselines, achieving a $\sim 100\%$ lift over rule-based baselines and $\sim 214\%$ over a random classifier, taking into account country-wide large-scale future deployment. We deal with various issues in the process, including data quality, high-cardinality categorical data, low target prevalence, distribution shift, variation across cohorts, algorithmic fairness, and the need for robustness and explainability. Our findings indicate that risk stratification of non-adherent patients is a viable, deployable-at-scale ML solution.
translated by 谷歌翻译
Variational quantum algorithms (VQAs) utilize a hybrid quantum-classical architecture to recast problems of high-dimensional linear algebra as ones of stochastic optimization. Despite the promise of leveraging near- to intermediate-term quantum resources to accelerate this task, the computational advantage of VQAs over wholly classical algorithms has not been firmly established. For instance, while the variational quantum eigensolver (VQE) has been developed to approximate low-lying eigenmodes of high-dimensional sparse linear operators, analogous classical optimization algorithms exist in the variational Monte Carlo (VMC) literature, utilizing neural networks in place of quantum circuits to represent quantum states. In this paper we ask if classical stochastic optimization algorithms can be constructed paralleling other VQAs, focusing on the example of the variational quantum linear solver (VQLS). We find that such a construction can be applied to the VQLS, yielding a paradigm that could theoretically extend to other VQAs of similar form.
translated by 谷歌翻译
由于Covid-19-19疫苗可用,因此没有研究量化不同的灾难疏散策略如何减轻避难所中的大流行风险。因此,我们应用了一个年龄结构化的流行病学模型,称为易感性暴露感染(SEIR)模型,以研究台湾不同的疫苗摄取水平以及在台湾实施的转移方案在多大程度上降低了感染和延迟流行峰值的情况。台湾的转移协议涉及转移因曝光而自我占用的人,从而阻止了他们与集体庇护所的普通公众融合。转移方案,结合足够的疫苗摄取,可以减少相对于没有这种策略的情况,相对于场景,感染的最大数量和延迟爆发。当所有暴露的人的转移是不可能的,或者疫苗的摄取不足时,转移方案仍然很有价值。此外,一组主要由年轻人人口组成的撤离者往往会早日出现大流行峰值,并且在实施转移方案时,多数老年人组的感染比多数老年人多。但是,当不执行转移方案时,多数老年人群体比大多数年轻成人群体高达20%。
translated by 谷歌翻译
多养殖养殖具有环境优势,但比单一养殖需要更修剪。我们介绍用于自动修剪的新型硬件和算法。自主系统使用高架摄像头从物理规模的花园测试床中收集数据,利用学识渊博的植物表型卷积神经网络和边界磁盘跟踪算法来评估单个植物分布并每天估算花园的状态。从这个花园状态下,Alphagardensim选择植物自主修剪。训练有素的神经网络检测并靶向工厂上的特定修发点。实验评估了两种与农业机器人龙门系统兼容的定制设计的修剪工具,并通过受控算法进行了自主削减。我们提出了四个60天的花园周期的结果。结果表明,该系统可以自主实现0.94个归一化的植物多样性,并在修剪剪切的同时保持平均冠层覆盖率为0.84,到周期结束时。有关代码,视频和数据集,请参见https://sites.google.com/berkeley.edu/pruningpolyculture。
translated by 谷歌翻译